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Abstract. The key-generation algorithm for the RSA cryptosystem is specified in 
O ; several standards, such as PKCS#1, IEEE 1363-2000, FIPS 186-3, ANSI X9.44, or 

' ISO/IEC 18033-2. All of them substantially differ in their requirements. This indi- 

cates that for computing a "secure" RSA modulus it does not matter how exactly one 
generates RSA integers. In this work we show that this is indeed the case to a large 
extend: First, we give a theoretical framework that will enable us to easily compute 
the entropy of the output distribution of the considered standards and show that it is 
\ comparatively high. To do so, we compute for each standard the number of integers 

lyj ' they define (up to an error of very small order) and discuss different methods of gen- 

O I crating integers of a specific form. Second, we show that factoring such integers is 

hard, provided factoring a product of two primes of similar size is hard. 
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1. Introduction 



An RSA integer is an integer that is suitable as a modulus for the RSA cryptosystem as 
, proposed by Rivest, Shamir & Adleman (1977, 1978): 

rS 

\ "You first compute n as the product of two primes p and q: 

n = p ■ q. 

These primes are very large, 'random' primes. Although you will make n pub- 
lic, the factors p and q will be effectively hidden from everyone else due to the 
enormous difficulty of factoring n." 

Also in earlier literature such as Ellis (1970) or Cocks (1973) one does not find any further 
restrictions. In subsequent literature people define RSA integers similarly to Rivest, Shamir 
& Adleman: Crandall & Pomerance (2001) note that it is "fashionable to select approxi- 
mately equal primes but sometimes one runs some further safety tests". In more applied 
works such as Schneier (1996) or Menezes et al. (1997) one can read that for maximum 
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security one chooses two (distinct) primes of equal length. Also von zur Gathen & Gerhard 
(2003) follow a similar approach. On suggestion of B. de Weger, Decker & Moree (2008) 
define an RSA integer to be a product of two primes p and q such that p < q < rp for some 
parameter r G R>i. Real world implementations, however, require concrete algorithms 
that specify in detail how to generate RSA integers. This has led to a variety of standards, 
notably the standards PKCS#1 (Jonsson & Kaliski 2003), ISO 18033-2 (International Or- 
ganization for Standards 2006), IEEE 1363-2000 (IEEE working group 2000), ANSI X9.44 
(Accredited Standards Committee X9 2007), FIPS 186-3 (NIST 2009), the standard of the 
RSA foundation (RSA Laboratories 2000), the standard set by the German Bundesnetza- 
gentur (Wohlmacher 2009), and the standard resulting from the European NESSIE project 
(Preneel et al. 2003). All of those standards define more or less precisely how to generate 
RSA integers and all of them have substantially different requirements. This reflects the in- 
tuition that it does not really matter how one selects the prime factors in detail, the resulting 
RSA modulus will do its job. But what is needed to show that this is really the case? 

Following Brandt & Damgard (1993) a quality measure of a generator is the entropy of 
its output distribution. In abuse of language we will most of the time talk about the output 
entropy of an algorithm. To compute it, we need estimates of the probability that a certain 
outcome is produced. This in turn needs a thorough analysis of how one generates RSA 
integers of a specific form. If we can show that the outcome of the algorithm is roughly 
uniformly distributed, the output entropy is closely related to the count of RSA integers it 
can produce. It will turn out that in all reasonable setups this count is essentially determined 
by the desired length of the output, see Section 5. For primality tests there are several results 
in this direction (see for example Joye & Paillier 2006) but we are not aware of any related 
work analyzing the output entropy of algorithms for generating RSA integers. 

Another requirement for the algorithm is that the output should be 'hard to factor' . Since 
this statement does not even make sense for a single integer, this means that one has to show 
that the restrictions on the shape of the integers the algorithm produces do not introduce any 
further possibilities for an attacker. To prove this, a reduction has to be given that reduces 
the problem of factoring the output to the problem of factoring a product of two primes of 
similar size, see Section 8. Also there it is necessary to have results on the count of RSA 
integers of a specific form to make the reduction work. As for the entropy estimations, we 
do not know any related work on this. A conference version of this article, focusing on 
the analysis of standardized RSA key-generators only, was published in Loebenberger & 
Nusken(201I). 

In the following section we will develop a formal framework that can handle all possi- 
ble definitions for RSA integers. After discussing the necessary number theoretic tools in 
Section 3, we give explicit formulae for the count of such integers which will be used later 
for entropy estimations of the various standards for RSA integers. In Section 4 we show 
how our general framework can be instantiated, yielding natural definitions for several types 
of RSA integers (as used later in the standards). The section afterwards compares in more 
detail the relations of the different notions. Section 6 gives a short overview on generic 
constructions for fast algorithms that generate such integers almost uniformly. At this point 
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we will have described all necessary techniques to compute the output entropy, which we 
discuss in Section 7. The following section resolves the second question described above 
by giving a reduction from factoring special types of RSA integers to factoring a product of 
two primes of similar size. We finish by applying our results to various standards for RSA 
integers in Section 9. 



2. RSA integers in general 

If one generates an RSA integer it is necessary to select for each choice of the security 
parameter the prime factors from a certain region. This security parameter is typically an 
integer k that specifies (roughly) the size of the output. We use a more general definition 
by asking for integers from the interval ]x/r,x], given a real bound x and a parameter r 
(possibly depending on x). Clearly, this can also be used to model the former selection 
process by setting x = 2^ — 1 and r = 2. Let us in general introduce a notion of RSA 
integers with tolerance r as a family 

of subsets of the positive quadrant M?^^, where for every x G M>i 

The tolerance r shall always be larger than 1. We allow here that r varies with x, which of 
course includes the case when r is a constant. Typical values used for RSA are r = 2 or 
r = 4 which fix the bit-length of the modulus more or less. Now an A-integer n of size x 
— for use as a modulus in RSA — is a product n = pq of a prime pair (p, q) G n (P x P), 
where P denotes the set of primes. They are counted by the associated prime pair counting 
function jj^A for the notion A: 

M>i N, 

X ^ #{{p,q)e¥xF\{p,q)eAx}. 

Thus every ^-integer n = pqis counted once or twice in #A {x) depending on whether only 
{p, q) G Ax or also G Ax, respectively. We call a notion symmetric if for all choices 

of the parameters the corresponding area in the (y, z) -plane is symmetric with respect to the 
main diagonal, i.e. that (y, z) G Ax implies also (z, y) G Ax- If to the contrary (y, z) G Ax 
implies {z, y) ^ Ax we call the notion antisymmetric. When we are only interested in 
the associated RSA integers we can always require symmetry or antisymmetry, yet many 
algorithms proceed in an asymmetric way. 

Note that varying r do not occur in standards and implementations for RSA integers, 
analyzed in Section 9. However, there are still quite natural notions in which a varying r 
occurs: Consider for example the notion where the primes p, q are selected from the interval 
Then we obtain the product pq G This corresponds to the notion 
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Figure 2.1: A generic notion of RSA integers with tolerance r. The gray area shows the 

parts of the (In y, In z) -plane which is counted. It lies between the tolerance bounds Inx 
and In-. The dashed lines show boundaries as imposed by [ci, C2] -balanced. The dotted 
diagonal marks the criterion for symmetry. 

discussed in Section 4.2 with r = s/x. Indeed, all the counting theorems in Section 4 can 
handle such large r. However, the error term is correspondingly large. 

Certainly, we will also need restrictions on the shape of the area we are analyzing: If one 
considers any notion of RSA integers and throws out exactly the prime pairs one would be 
left with a prime-pair-free region and any approximation for the count of such a notion based 
on the area would necessarily have a tremendously large error term. However, for practical 
applications it turns out that it is enough to consider regions of a very specific form. Actually, 
we will most of the time have regions whose boundary can be described by graphs of certain 
smooth functions, see Definition 3.3(u). 

For RSA, people usually prefer two prime factors of roughly the same size, where size 
is understood as bit length. Accordingly, we call a notion of RSA integers [ci , C2]-balanced 
iff additionally for every x G M>i 

A:,c{iy,z)eRli\y,ze[x''\x^']}, 

where < ci < C2 can be thought of as constants or — more generally — as smooth func- 
tions in X defining the amount of allowed divergence subject to the side condition that x'^^ 
tends to infinity when x grows. If ci > ^ then Ax is empty, so we will usually assume 
ci < ^. In order to prevent trial division from being a successful attacker it would be suf- 
ficient to require y,z ^ VL {\\\^ a;) for every G N. Our stronger requirement still seems 
reasonable and indeed equals the condition Maurer (1995) required for secure RSA moduU, 
as the supposedly most difficult factoring challenges stay within the range of our attention. 
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Figure 2.2: Levels e'^ of the function for A; G {2 + £, 3, . . . , 8}. The darker the line the 
higher is the value of k. 

As a side-effect this greatly simplifies our approximations later. The German Bundesnetza- 
gentur uses a very similar restriction in their algorithm catalog (Wohlmacher 2009). We can 
— for a fixed choice of parameters — easily visualize any notion of RSA integers by the 
corresponding region Ax in the (y, 2;)-plane. It is favorable to look at these regions in log- 
arithmic scale: writing y = e'" and z = e^, we depict the region (In^)-r in the {v, C)-plane 
corresponding to the region Ax in the {y, z)-plane, i.e. {v, Q G (ln.4)j; {y, z) G Ax-^Q 
obtain a picture like in Figure 2. 1 . 

Often the considered integers n = are also subject to further side conditions, hke 
gcd((p — l){q — l),e) = 1 for some fixed public RSA exponent e. Most of the number 
theoretic work below can easily be adapted, but for simplicity of exposition we will often 
present our results without those further restrictions and just point out when necessary how 
to incorporate such additional properties. 

In Wohlmacher (2009) it is additionally required that the primes p and q are not too close 
to each other. We ignore this issue here, since the probability that two primes are very close 
to each other would be tiny if the notion from which (p, q) was selected is sufficiently large. 
If necessary, we are able to modify our notions such that also this requirement is met. 

In order to count the number of .4-integers we have to evaluate 

#A{x)= 1- 

If we follow the intuitive view that a randomly generated number n is prime with probability 
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j^, we expect that we have to evaluate integrals like 

rr — \ — dy, 
J J In y In 2; 

while carefully considering the error between those integrals and the above sums. In loga- 
rithmic scale we obtain expressions of the form JJ(ijj_4^ d^ dv. To get an understanding 
of these functions, in Figure 2.2 some contour lines of the inner function are depicted. From 
the figure we observe that pairs {v, C) where f + C is large have a higher weight in the overall 
count. 

As we usually deal with balanced notions the considered regions are somewhat centered 
around the main diagonal. We will show in Section 8 that if factoring products of two primes 
is hard then it is also hard to factor integers generated from such notions. 



3. Toolbox 

We will now develop the necessary number theoretic concepts to obtain formula for the 
count of RSA integers that will later help us to estimate the output entropy of the various 
standards for RSA integers. In related articles, hke Decker & Moree (2008) one finds counts 
for one particular definition of RSA integers. We beheve that in the work presented here for 
the first time a sufficiently general theorem is established that allows to compute the number 
of RSA integers for all reasonable definitions. 

We assume the Riemann hypothesis throughout the entire paper. The main terms are the 
same without this assumption, but the error bounds one obtains are then much weaker. We 
use the following version of the prime number theorem: 

Prime number theorem 3.1 (Von Koch 1901, Schoenfeld 1976). If (and only if) the 
Riemann hypothesis holds, then for x > 2657 

|7r(a:;) — li(x)| < -^v^lnx, 

OTT 

where^x) := ^ j^. 

We first state a quite technical lemma that enables us to do our approximations: 

Lemma 3.2 (Prime sum approximation). Let /, /, / he functions [B, C] — M>i, where 
B,C e M>i such that / and f are piecewise continuous, f + f is either weakly decreasing, 
weakly increasing, or constant, and forp G [B,C] we have the estimate 

f{p)-m\<m- 

Further, let E{p) be a positive valued, continuously differentiable function of p bounding 
\Tr{p) — li(p)| on [B,C]. (For example, under the Riemann hypothesis we can take E{p) = 
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^^\np provided B > 2657.) Then 

<9 



pePn]B,C] 



with 



9 = / -j dp , 

Jb Inp 

^dp+2{f + f)iB)EiB) + 2{f + f){C)E{C) + (j + f) {p)E' {p) dp . 
In the special case when f + f is constant we have the better bound 

^= C^^^ /)(^)(^(^) + E{C)) 

Proof. The proof can be done analogously to the proof of Lemma 2. 1 in Loebenberger 
& Niisken (2010): First, rewrite X]pePn]B C] fip) ^ Stieltjes integral f{p) dp. Then 
integrate by parts, estimate tt, and finally integrate by parts 'backwards'. □ 

Next we formulate a lemma specialized to handle RSA notions. We cannot expect to 
obtain an approximation of the number of prime pairs by the area of the region unless we 
make certain restrictions. 

The following definition describes the restrictions that we use. As you will notice, it 
essentially enforces a certain monotonicity that allows the error estimation. 

Definition 3.3. Let Abe a notion of RSA integers with tolerance r. 

(i) The notion A is graph-bounded iff there are (at least) integrable boundary functions 
Bi,Ci: M>i M>i and -B2, C2 : R^i M>i such that we can write 



Acc = < (y, z) G 



Biix) <y<Ci{x), 
B2{y,x) < z < C2{y,x) 



where for allx G M>i andally G ]Bi{x),Ci{x)[ we have 1 < -Bi(x) < Ci(x) < x 
andl < B2{y,x) < C2{y,x) < x. 

(ii) The notion A is monotone at x (relative to the error bound E) for some x G M>i Of it 
is graph-bounded and the function 

rC2{p,x) 1 ^ ^ 

/ — dq+E{B2ip,x))+E{C2{p,x)) 
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is either weakly increasing, weakly decreasing, or constant as a function in p restricted 
to the interval [Bi {x),Ci (x)] . If not mentioned otherwise we refer to the error bound 
given by E{p) = ^^Inp. 

We call the notion A monotone iff it is monotone at each x G M>i where Ax ^ 0- 
(Hi) The notion A is piecewise monotone iff there is a parameter m G N such that 



where Aj^, are all monotone notions of RSA integers of tolerance r. Note that we may 
also allow m to depend on x. 

For (i) note that Bi{x) = Ci(.t) allows to describe an empty set Ax, and otherwise the 
inequality B2{y, x) / C2{y, x) makes sure that all four bounding functions are determined 
by Ax as long as y G ]Bi{x), Ci{x)[. This condition enforces that Ax is (path) connected. 
We do not need that but also it does no harm. For (iii) observe that in the Ught of a multi- 
application of Lemma 3.6 we would be on the safe side if we require m G X. At 



the extreme m £ o i^cix 4 Inxj with c = max (2c2 — 1, 1 — 2ci) is necessary for any 
meaningful result generalizing Lemma 3.6. As in particular (ii) is rather weird to verify we 
provide an easily checkable, sufficient condition for monotonicity of a notion. 

Lemma 3 .4. Assume A is a graph-bounded notion of RSA integers with tolerance r given 
by continuously differentiable functions Bi,Ci: M>i — > M>i and B2,C2: M>i M>i. 
Finally, let x G M>i be such that 

o the function B2{p, x) is weakly decreasing in p and 

o the function C2{p, x) is weakly increasing in p 

forp G ]Bi(x),Ci{x)], or vice versa. As usual let E{p) be the function given by E{p) = 
^^/piap■ Then the notion A is monotone at x (relative to E). 

Proof. The goal is to show that the function 



m 



Ax • — A^ 





f-C2(p,x) 2 

B2(p,x) Ing 




is weakly increasing or weakly decreasing in p. We write B2{p,x) and C2{p,x), respec- 
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lively, for the derivative with respect to p. Note that 



>o 

2 + lnB2(p,x) 



>o 



Some simple calculus shows that the second underbraced term is always positive since 
B2{p,x) > 1. Thus if B2{p,x) is weakly decreasing and C2{p,x) is weakly increasing, 
we have that h{p) is weakly increasing. If on the other hand B2{p, x) is weakly increasing 
and C2{p, x) is weakly decreasing it follows that h{p) is weakly decreasing. □ 

Clearly, the conditions of the lemma are not necessary. We can easily extended it, for exam- 
ple, as follows: 

Lemma 3.5. Assume A is a graph-bounded notion of RSA integers with tolerance r given 
by continuously differentiable functions B\,C\: M>i — ^ IR>i and B2,C2: M.^^ K>i. 
Further, individually for each x G M>i, the functions B2{p, x) and C2{p, x) are both weakly 
increasing in p for p G ]Bi{x),Ci{x)]. Then there are two monotone notions and A? 
with tolerance r, both having A^. C M>Bj(j;) x ^>b2{Bi{x) x) for all x, such that A = 
A^\A\ 

Proof, l^&t A{x) := B2{Bi{x),x). We define two [ci, C2]-balanced graph-bounded no- 
tions A^, A^ of RSA integers by the following: the first notion A^ is defined by the func- 



is 



tions B\ := Bi, C\ := Ci, B\{p,x) := A[x) and := C2. The second notion A 
defined by the functions B\ := B\, C\ := Ci, Bl{p,x) := A{x) and C| := B2. Since 
x/r < Bi{x)B2{Bi{x) , x) = Bi{x)A{x) both new notions have tolerance r as well. Then 
.4^, A^ are by Lemma 3.4 both monotone and A = A^\A^. □ 

A similar result with B2 and C2 both weakly decreasing is more difficult to obtain while 
simultaneously retaining the tolerance. A particularly difficult example is the maximal no- 
tion M'^'''^ given by Mx""^ = { (y, z) G K?.^ \f<yz<xAy,z>x''^}. The following 
lemma covers all the estimation work. Notice that we could in principle obtain explicit 
values for the O () constant based on Lemma 3.2 but the expressions are rather ugly. 

Lemma 3.6 (Two-dimensional prime sum approximation for monotone notions). Assume 
that we have a monotone [ci, C2]-balanced notion A of RSA integers with tolerance r, where 
< ci < C2. (The values r, ci, C2 are eillowed to vary with x.) Then under the Riemarm 

hypothesis there is a value a{x) G t^j j such that 



w X -/ X 4area(^a;) / 1 3+c\ 
#A (x) G a{x) ■ — -^-^ + O c{^x— 
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where c = max (2c2 — 1, 1 — 2ci). 

Note that the following proof gives a precise expression for a{x), namely 

lU. E]n^ dp dq 



a{x) = 



4JI 



1 



Ax hp 



dp dq 



It turns out that we can only evaluate a{x) numerically in our case and so we tend to estimate 
also this term. Then we often obtain a{x) G 1 + o(l). Admittedly, this mostly eats up the 
advantage obtained by using the Riemann hypothesis. However, we accept this because it 
still leaves the option of going through that difficult evaluation and obtain a much more 

(—1 3+c \ 



with O yj^j for any A; > 2 of your choice. 

Proof. Fix any x G M>i. In case area(^x) = the claim holds with any desired a{x) 
and zero big-Oh term. We can thus assume that the area is positive. As the statement is 
asymptotic and x'^^ tends to oo with x we can further assume that x'^^ > 2657. Abbreviating 



h{x) 



4area(^x 

72 



, we prove that there exists a value a{x) € 

{x) - a{x) ■ h{x) < h{x 



1 1 

4^' 4^ 



such that 



with 

h{x) 



47rci 



12 

6C2 + — 

Inx 



1+C2 1 1 , 21nlna! 1 

X 2 +-_.a;2"^ tax +- 

OTT^ 47rci 



1 + 



Inx 



l_£i 
X 2 



This is slightly more precise and implies the claim. 

Since the given notion is [ci, C2]-balanced with tolerance r for any (y, z) G Ax we have 

X 

r 

have 



< yz < xa.ndy,z G [x'^i, x'^^] which imphes In y, In z G [ci,C2]lnx. Equivalently, we 



(3.7) 



< Bi{x) < Ci{x) < x' 



C2 



— < B2{y,x) < C2{y,x) < - 
ry y 



and for y e]Bi{x), Ci (x) [ we have 
(3.8) 
and 

(3.9) < B2iy,x) < C2{y,x) < 
From (3.8) we infer that for all y G Ci(x)[ we have 

(3.10) —<yB2(y,x)<x and — < yC2(y,x) < x. 
r r 
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In order to estimate 



E E 

pevn]Bi {x),Ci (x)] qePn] B2 {p,x),C2 {p,x)] 



we apply Lemma 3.2 twice. Since x'^^ > 2657 and so B2{p, x) > 2657 for the considered p 
we obtain for the inner sum 



1 - gi{p,x) 

qeFn]B2ip,x),C2(p,x)] 



where 



_ r^2(p,x) ^ 

gi{p,x) = - — dq, 
Jb2{p,x) ing 

g^{p,x) = E{B2{p,x))+E{C2{p,x)), 



since we can use the special case of constant functions in Lemma 3.2. Because we are work- 
ing under the restriction that the notion is monotone, i.e. gi{p, x) + gi{p, x) is monotone, 
we are able to apply the lemma a second time. Since x'^^ > 2657 and so -Bi(x) > 2657 we 
obtain 



p&n]Bi{x),Ci{x)] qeFn]B2ip,x),C2(p,x)] 



where 



92{x) = / / — dq dp, 

1 , /.C2{Bi(x),x) 1 

+ —,/W(x)lnBi{x) —dq 

Jb2(Bi(x),x) In? 



dp 



1 , rC2(Ci(x),x) , 

+ —y/C^lnCiix) / — dq 

Jb2(C,(x),x) Ing 



+ ^^/Mx)\nB^{x) (^y^B2{Bi{x),x)ln{B2{Bi{x),x)) + ^C2{Bi{x),x)ln{C2{Bi{x),x)) 

+ ^y/C^\nCi{x) ( x/B2 (Ci (x) , x) In (B2 (Ci (x) , x)) + y/C2 (Ci (x) , x) In (C2 (Ci (x) , x) )) 
J_ rC2(p,x) i„p_^2 

^-^JBiix) Jb2(p,x) 2^1nq 



dg dp. 



It remains to estimate 52 (^c) and 52(2;) suitably sharply. 

For (p, g) G Ax we frequently use the estimate Inp, In g G [ci, C2] In x. For the main 
term we obtain 



52(2;) G 



4 area (^3;) 



1 1 
-J' 
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We also read off the exact expression a{x) = ^^^^ ) 92{x). 

We treat the error term g2 (x) part by part. For the first term we obtain 

1 rCi{x) , , / 1 lnn + 2\ 

^/ (yB^(?:i),„S,(p,x) + ycy?-5l„C,(p,x)).(- + ^j dp 



1 {'^"^ Fx , / x\ 3 



^ir A/ - In - dp 



3 1 f^lnf- 
47rcilna; y^ci ]/ p \p 



^31/, 2 \ l+£2 / _i i±£2\ 



where we used in the second line that ^\^J^ < for all p > 2. Basic calculus shows that 

'°^2^^^'* maximal at p = exp(\/5 + 1), where it is less than 1.68. For the fourth line 
note that 

^ln(^) dp=2p,[^(\n(-)+2 
p \pj \ P \ \Pj 

The definite integral is not greater than this function evaluated at p = x'^^ since ci < ^. 
Using C2 > gives the claim. 
The second term yields 

1 l'C2iBi{x),x) 

— V5i(x)lnSi(x) / — dq 



1 



< ^ — V^i {x)C2 {Bi (x) , x) In Bi {x) 

Svrci In X 

^ 1 i±£2 / _i i±£2 

< X 2 G O Ci X 2 

- Svrci V 1 



1+C2 



since we have Bi{x)C2{Bi{x) , x)a/C2(-Bi(x),x) < x 2 and In -Bi(x) < In x. 
Similarly we obtain for the third term 

I j'C2{Ci{x),x) 



1 , j-<^2y'^iyx},x) 1 

— V^^taCi(x) / —Aq 
oTT Jb2(CUx).x) ing' 



S2(Ci(a;),x) Ing' 
1 1+^2 / _i 1±£2\ 

< - — X 2 eO Ic.'^x 2 , 
- Sttci V 1 y ' 

1+02 



using Ci(x)C2(Ci (x), x) C2(Ci(x), x) < x 2 andlnCi(x) < Inx. 
The fourth term yields 

^V^i(x)ln5i(x) (^^/B2iB^{x),x)lnB2{B^{x),x) + VC2(Si(x), x) In C2(Si(x), x)) 
^Y^v^l°'^eo(x^), 
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where we used (3.10) and the (very weak) bound \nBi{x),\nC2(p,x) < \nx. The fifth 
term can be treated similarly. We finish by observing for the sixth term 



1 rCi{x) /-Cab,!) jjjp _|_ 2 
Stt is, (^) Jb2(p,x) 2^1ng 



1 1 /-CiW rC2(p,x) 
- Stt ci hi ar 7^, Jb^(v.x) ^/P 



Bi(a:) Jb2(j>,x) VP 



dq dp 



< — ['^ dq dp 

~ STTClhlX J^ci Jo 

1 1 Z-^'' Inp 



Stt ci In X ^ p3/2 

< 1 + - — ]x^-^ 

47r ci V In X / 



using Bi(x) > x'^i, ci < i, and 



/ 



Inp , -2(lnj9 + 2) 
dp^ 



p3/2 - ^ 

This completes the proof. □ 

In specific situations one may obtain better estimates. In particular, when we substitute 
C2{p, x) by x/p in the estimation of the sixth summand of the error we may loose much. 

Of course we can generahze this lemma to notions composed of few monotone ones. We 
leave the details to the reader. As mentioned before, in many standards the selection of the 
primes p and q is additionally subject to the side condition that gcd((j9 — l){q ~ l),e) = 1 
for some fixed public exponent e of the RSA cryptosystem. To handle these restrictions, we 
prove 

Theorem 3.11. Let e e N>2 be a public RSA exponent and x G M. Then under the 
Extended Riemann Hypothesis we have for the number 7re(x) of primes p < x with gcd{p — 
1, e) = 1 that 

TTefx) G ^^44- • Hx) + O (^/^lnx) , 
where li(x) = ^ dt is the integral logarithm, ip{e) is Euler's totient function and 

y^i(e) 



0-'^) n 



e\e 

prime 
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Proof. We first show that the number of elements in Zg n (1 + Zg ) is exactly (^i(e). 
Write e = ]^ £f^^\ Observe that by the Chinese Remainder Theorem we have 

i prime 

z- n (1 + z- ) = n (1 + z^,,,, )) 
e\e 

£ prime 

and each factor in this expression has size {£ — 2)£f^^'>~^. Multiplying up all factors gives 

#(z,-n(i + z,-))= n (1-731) (i-0^^(') = ^i(e). 



? prime 



To show the result for 7re(x) note that Oesterle (1979) implies the following quantitative 
version of Dirichlet's theorem on the number 7re-a{x) of primes p < x with p = a & Z^, 
when gcd(a, e) = 1 under the Extended Riemann Hypothesis 



7re;a(a;) ■ Hx) 



< ^/x{\nx + 2\ne). 



This is also documented in Bach & Shalht (1996, Theorem 8.8.18). We now have to sum 
over ipi (e) residue classes and so obtain 

neix) G • liix) + O (ipi{e)V^\nx) , 

which proves the claim. □ 

This theorem shows that the prime pair approximation in Lemma 3.6 can be easily adapted 
to RSA integers whose prime factors satisfy the conditions of Theorem 3.11 (when assuming 
the Extended Riemann Hypothesis), since the density of such primes differs for every fixed 
e essentially just by a multiplicative constant compared to the density of arbitrary primes. 

4. Some common definitions for RSA integers 

We will now give formal definitions of three specific notions of RSA integers. In particular, 
we consider the following example definitions within our framework: 

o The number theoretically inspired notion following Decker & Moree. Note that this 
occurs in no standard and no implementation. 

o The simple construction given by just choosing two primes in given intervals. This 
construction occurs in several standards, hke the standard of the RSA foundation 
(RSA Laboratories 2000), the standard resulting from the European NESSIE project 
(Preneel et al. 2003) and the FIPS 186-3 standard (NIST 2009). Also open source 
implementations of OpenSSL (Cox et al. 2009), GnuPG (Skala et al. 2009) and the 
GNU crypto library GNU Crypto (Free Software Foundation 2009) use some variant 
of this construction. 
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o An algorithmically inspired construction which allows one prime being chosen arbi- 
trarily and the second is chosen such that the product is in the desired interval. This 
was for example specified as the IEEE standard 1363 (IEEE working group 2000), 
Annex A.16.11. However, we could not find any implementation following this stan- 
dard. 



Inz Inz 




^FB(r,0) ^FB(r,i) ^FB(r,l) 



Figure 4.1: Three notions of RSA integers. 



4.1. A number theoretically inspired notion. In Decker & Moree (2008) on the sugges- 
tion of B. de Weger, the number (x) of RSA integers up to x was defined as the count of 
numbers whose two prime factors differ by at most a factor r, namely 

3p,q GF: 

n = pq A p < q < rp A n < x 



Cr (x) := # |n e N 
Written as a notion of RSA integers in the sense above, we analyze 



\nz 



y X ^ 

— < z < ry A — < yz < X > 
r r ) 



a;GK>i 



(4.1) .A™«:=({(j/,z)g: 

Note that the prime pair counting function of this notion is closely related to the function 
Cr {x): Namely we have 

#^™(^) (X) = 2 (Cr (X) - Cr (^)) + (tT (^i) - TT (^^^^ , 
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where the last part is comparatively small. We now analyze the behavior of the function 
:^^DM(r-) under the Riemann hypothesis. Similar to Decker & Moree (2008), we rewrite 



(4.2) i.#^™«(-)= E E 



With these bounds we obtain using Lemma 3.6: 
Theorem 4.3. Under the Riemann hypothesis we have 



(x) e a(x)^ (inr-—)+o( xtri 
In X \ r J V 



witha{x) G 



, 1 + 



In r \ 111 2 In r 



. This makes sense as long as r G O (x 2 ^ ) 



Ini+lnr / ' \ In x— 2 Inr 

for some g: > 0. If additionally Inr G o(lnx) then a(x) G 1 + o(l). 

You may want to sum this up as #^™('') (x) G (1 + o(l))j^ (inr - i^). However, you 
then forego the option of actually calculating a(x). 

Proof. Consider x large enough such that all sum boundaries are beyond 2657, i.e. > 
2657. By definition A^^^^^ is a notion of tolerance r. Further it is [ci, C2]-balanced with 
ci = log^ = 1 - jar and C2 = log^ (V^) = 5 + As depicted next to (4.1), 

we treat the upper half of the notion as the union of those two notions matching the two 
double sums in (4.2), which both inherit being [ci, C2] -balanced of tolerance r. Considering 
the inner bounds ^ to rp and p to |, respectively, as a function of the outer variable p, we 
observe that the lower and upper bound in each case have opposite monotonicity behavior 
and thus by Lemma 3.4 each part is a monotone notion. We can thus apply Lermna 3.6. 
Under the restriction Inr G o(lnx) we have ci,C2 G ^ + o(l), which implies that -j G 

4(1 + 0(1)) for both i G {1, 2}. Computing the area of the two parts yields 

r'\ . . 1 Inr 



L 



1 dg dp = - • X I 1 — — — - 
2 V r r 



and 

n If 1 

/ / ldQdp=--x lnr — IH — 

For the error term we obtain C'(x4r 2 ) noting that the number tt (^/x) of prime squares up 
to X is at most -y/x- D 



Nolions for RSA inleucrs 17 



Actually, we can even prove that the error term is in O 4 r 4 ^ . We lost this in the last steps 
of the proof of Lemma 3.6 when we replaced C2{p, x) = rp by x/p. 

4.2. A fixed bound notion. A second possible definition for RSA integers can be stated as 
follows: We consider the number of integers smaller than a real positive bound x that have 
exactly two prime factors p and q, both lying in a fixed interval ]5, C], in formula: 



3p,q£¥n]B,C]: 
n = pq A n < X 



To avoid problems with rare prime squares, which are also not interesting when talking about 
RSA integers, we instead count 



4,C (x) := # { (p, 9) G (P n ]B, C]f \pq<xY 



hiz 



Such functions are treated in Loebenberger & Niisken (2010) 
In the context of RSA integers we consider the notion 




In J/ 



(4.4) ^™('^'-):=^|(y,.)GM|i|y^ 



< y,z < yr'^x f\ yz < X 

r 



with a G [0, 1]. The parameter a describes the (relative) distance of the restriction yz < xio 
the center of the rectangle in which y and z are allowed. We split the corresponding counting 
function into two double sums: 



(4.5) 



#^™('-'-) (x) = E 1 

+ E El- 



The next theorem follows directly from Loebenberger & Niisken (2010) but we can also 
derive it from Lemma 3.6 similar to Theorem 4.3. 

Theorem 4.6. We iiave under the Riemann hypothesis 

(x) G a{x)^ (alnr + 1 - + -\+0 fxir^) 
In X V ^-2- r J \ J 



with a(x) G 
l + o(l). 



(-1 (T In r \ ^ i y A- Iny V 



If additionally In r G o(ln x) then a G 
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Proof. Let x be such that all sum boundaries are beyond 2657. By definition ^™('''^) 
is a notion of tolerance r. Further for all a G [0,1] it is clearly [ci, C2] -balanced with 
ci = log^ = 1 - and C2 = log^ = ^ + f^f ■ depicted next to (4.4), we 

treat the notion as the union of two notions corresponding to the two double sums in (4.5), 
which are both [ci, C2]-balanced of tolerance r. 

Consider the iimer bounds to ■s/r'^x and ^/^ to | respectively, as a function of the 
outer variable p (while a is fixed): We observe that the lower and upper bound in the first 
case are constant and in the second case consist of a constant lower bound and an weakly 
decreasing upper bound. Thus by Lemma 3.4 each part is a monotone notion and we can 
apply Lemma 3.6. 

As in the proof of Theorem 4.3, we have under the additional restriction In r G o(ln(x)) 
that G 4 (1 + 0(1)) for both i G {1,2}. Computing the area of the two parts yields 



4.3. An algorithmically inspired notion. A third option to define RSA integers is the 
following notion: Assume you wish to generate an RSA integer between f and x, which 
has two prime factors of roughly equal size. Then algorithmically we might first generate 
the prime p and afterward select the prime q such that the product is in the correct interval. 
As we will see later, this procedure does — however — not produce every number with the 
same probability, see Section 6. Formally, we consider the notion 




and 





□ 



(4.7) 




We proceed with this notion similar to the previous one. By observing 



(4.8) 






1 



+ 





1 



and again applying Lemma 3.6 and Lemma 3.4 we obtain 
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Theorem 4.9. We have under the Riemann hypothesis 



with a{x) G 
a G 1 + 0(1). 



21nr 



In a;+2 In r 



\2 , 21nr y 

J 'V lna:-21nr ^ 



. If additionally \nr G o(lnx) then 



Proof. Again let x be such that all sum boundaries are beyond 2657. By definition 

^ALGi(r) ^ notion of tolerance r. Further it is clearly [ci, C2] -balanced with ci = log^ = 
I — 1^ and C2 = log^ r^/x = ^ + As depicted next to (4.7), we treat the notion as 
the union of two notions corresponding to the two double sums in (4.8), which are both 
[ci, C2] -balanced of tolerance r. 

If we consider the inner bounds ^ to | and ^ to s/x, respectively, as a function of 
the outer variable p, we observe that in both cases one of them is constant and the other 
decreasing. Furthermore by Lemma 3.4 each part is a monotone notion. We can thus apply 
Lemma 3.6. 

As for the previous notions we have under the additional restriction In r G o(ln(a;)) that 
^ G 4 (1 + 0(1)) for both i G {1, 2}. Computing the area of the two parts yields 



[^ldqdp=x(l — — - 

^ \ r r 



and 



X r\/x 



1 dq dp = X In r — 1 H — 

rp 



For the error term we obtain O 



□ 



Note that we also could have employed Lemma 3.5, but in this particular case we decided to 
use another split of the notion. 

The IEEE standard PI 363 suggest a slight variant, both generalize to 



(4.10) 



^ALG(r,.)(^) :=( J(2y,^)GM2^1 



'X < y < r'^y/x, 
— < z < -, 
^ <yz <x 



i;eK>i 



with a G [0, 1]. Now, our notion above is ^ALG(r,0)^ jggg variant is A^^^'''^\ By 

similar reasoning as above we obtain 



20 Loebenberiicr & NQsken 



Theorem 4.11. We have under the Riemann hypothesis 

In X V f J ^ ^ 



with a(x) G 



1 



2a' In r 



, , 2(l+a)lnr V 

' I lna;-2(l+CT)lnT- ^ 



In x+2cf' In r 

additionally \nr G o(lnx) theiia(x) G 1 + o(l 



, where a' = max{a, 1 — a). If 

□ 



4.4. Summary. As we see, all notions, summarized in Figure 4.1, open a slightly different 
view. However the outcome is not that different, at least the numbers of described RSA 
integers are quite close to each other, see Section 5. 

Current standards and implementations of various crypto packages mostly use the no- 
tions .4™(4'°), .4™(^'i), .4™(2,o) or ^ALG(2,i/2)_ details see Section 9. 

5. Arbitrary notions 

The preceding examinations show that the order of the analyzed functions differ by a factor 
that only depends on the notion parameters, i.e. on r and a, summarizing: 

Theorem. Assuminglnr G o(lna;) andr > 1 and a G [0, 1] we have 



(Hi) #^ALGi(r) (^) e (1 + (Inr - l^:^) . 



□ 



It is obvious that the three considered notions with many parameter choices cover about the 
same number of integers. 

To obtain a much more general result, we consider the following maximal notion 



(5.1) 



M 



iy,z) G 



l-ci 



In 2 



X < y < x 
x'^^ < z < x^~'^^, 
^ <yz<x 



xeK>i. 



Iny 



All of the notions discussed in Section 4 are subsets of this notion. Using the same tech- 
niques as above, we obtain: 

Theorem 5.2. For Inr G o(lnx) we have under the Riemann hypothesis 

(i) For ci < i - ^ and for some £ and large x additionally c\ > ^ — In^ a; In r, we 



have that 



{x) e «(^) ((1 - 2ci) (^1 - Inx - 1 + i^^^) +0 (c-i^i-^ In^+i) 
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(ii) when ci > ^ — we obtain the fixed bound notion 

(x) G a{x):^ (^{1 - 2ci) Inx + - 1^ +0 (c^^ ■ x^'^x) 

TMs is independent of r. 

. In particular for ci G 5+0(1) wehavea{x) G l+o(l). 

Case (i) considers the case where the notion M'^''^^ looks like a thin band. The other 
alternative (ii) treats the case where the notion is actually a triangle, namely the notion 
^FB(:ri-2-i,i)_ In the former case we have to make sure that the band is not too long so that 
we may apply Lemma 3.6 for not too many pieces. As noted after Definition 3.3, the first 
case could still be somewhat extended. 



In both cases a{x) G 



4(l-ci)^ 



1 

4^. 



Proof. As usual let x be such that all sum boundaries are beyond 2657. By definition 
j\^r,ci jg ^ notion of tolerance r. Further it is clearly [ci , 1 — ci]-balanced. For ci > | — 
the result follows directly from Theorem 4.6, since M^'"^^ is simply the fixed bound notion 

For ci < ^ — we treat the notion as the sum of several monotone, [ci, 1 — ci]- 
balanced notions of tolerance r by triangulating the maximal notion as indicated in the pic- 
ture next to (5.1). The number m of necessary cuts is (1 — 2ci) ^ which is in O (in^"*"^ x) 
by assumption. This gives by Lemma 3.6 the claim. □ 

We obtain 

Theorem 5.3. Letci,C2 e ^ + 0(1), r > 1 wifhlnr G O (^^rf^) ^ o(lnx) be possibly 
x-dependent values, and a G ]0, 1[ constant. Consider a piecewise monotone notion A of 
RSA integers with tolerance r such that for large x G M>i we have area > ax. Then 

#A (x) = ■ a{x) 
In X 

where a{x) G o(lnx) anda(x) > a — e(x) for some e{x) G o(l). 

In particular, the prime pair counts of two such notions can differ by at most a factor of 
order o(lnx). 

Proof. Let A be as specified. Assume x to be large enough to grant that area Ax > ax 
and x'^i > 2657. Without loss of generality we assume ci + C2 < 1. Otherwise we replace 
C2 = 1 — ci. Denote c := max(2c2 — 1, 1 — 2ci), this now is always in [0, 1]. By Lemma 3.6 
we obtain 

4x 3+c 

#A (x) > a(x), a(x) G 0(x~). 

In X 

To provide an upper bound, we consider the [ci, 1 — ci] -balanced maximal notion (5.1). As 
mentioned above we have for all x G M>i that Ax C Mx'^^, and so #A (x) < (x). 
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A^inz huz Inz 




Figure 5.1: Enclosing notions of RSA integers using others. 



Note that ci < 2> as otherwise Ax would be empty rather than having area at least ax. By 
assumption we have ci € ^ + o(l) and thus < 1 — 2ci G o(l). Now the claim follows 
from Theorem 5.2 and the assumption In r G o(ln x). □ 

In the following we will analyze the relation between the proposed notions in more 
detail. Namely, we carefully check how each of the notions can be enclosed in teniis of the 
others. Clearly the fixed bound notions ^™('"''^) enclose each other: 

Lemma 5.4. Forr e M>i, x € M>i and a, a' G [0, 1] with a < a' we have 

^^FB(v^,l) < #^FB(r,0) < #^FB(r,a) < #^FB(r,a') < #^FB(r,l) 

Proof. For the first inequality simply observe that xj ^ < x. The remaining inequalities 
follow from the fact that \/r'^x < V r°''x whenever a < a'. □ 

We can also enclose different notions by each other: 
Lemma 5.5. Forr G M>i and x £ M>i we have 

^#^^^(^^'1) (X) < (X) < #^ALG,(0 < #^FB(rM) 

Proof. We prove every inequality separately. For an easier understanding of the proof a 
look at Figure 5.1 is advised: 

1 ^^FB(r,i) < i^^DM(r) Consider the double sum (4.5) 

^#^™(-^) (X) = ^ E E 1= E El 

due to the restriction p < q. This is exactly the second summand in (4.2). 
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i^^DM(r) < ^^ALGi(r) Consider again the double sum (4.2). We expand the 
summation area for q (thus increasing the number of primepairs we count) in order 
to obtain the sum (4.8) for the algorithmic notion: For the first summand we obtain 
from p < that < | and for the second summand from the same argument that 
^ < p. The third summand disappears while doing this, since the squares (which are 
counted by the third summand) are now counted by the second summand. Thus we 
can bound the whole sum from above by changing the summation area for q in this 
way. 

^^ALGi(r) :^^FB(r^,i) ^^-j. proceed as in the previous step, by replacing in the 

sum (4.8) the summation area for q: Since p < ^Jx, we obtain ^> Now smce 
\fx < r^/x the claim follows. □ 

We actually can enclose the Decker & Moree notion even tighter by the fixed bound notion: 
Lemma 5.6. Forr e M>i and x G R>i we have 

(x) < (x) < (x) . 

Proof. Assume < P < Q < Vrx and pq < x. Then f < pq < x and q < rp. If on 
the other hand f < pq < x and p < q < rp, then ^ < ^pq < p^ < q^ < rpq < rx and the 
claim follows. □ 



All the inclusion described above are compatible to the result from Theorem 5.3. However, 
many of the expUcit inclusions are much tighter. 



6. Generating RSA integers 

In this section we analyze how to generate RSA integers properly. It completes the picture 
and we found several implementations overlooking this kind of arguments. 
We wish that all the algorithms generate integers with the following properties: 

o If we fix X we should with at least overwhelming probability generate integers that are 
a product of a prime pair in Ax- 

o These integers (not the pairs) should be selected roughly uniformly at random. 

o The algorithm should be efficient. In particular, it should need only few primaUty 
tests. 

For the first point note that we usually use probabilistic primality tests with a very low 
error probability, for example Miller (1976), Rabin (1980), Solovay & Strassen (1977), or 
Artjuhov (1966/67). Deterministic primality tests are also available but at present for these 
purposes by far too slow. 
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6.1. Rejection sampling. Assume that ^ is a [ci, C2] -balanced notion of RSA integers 
with tolerance r. The easiest approach for generating a pair from A is based on von Neu- 
mann's rejection sampling method. For this the following definition comes in handy: 

Definition 6.1 (Banner). A banner is a graph-bounded notion of RSA inte- 
gers such that for all x G M>i and for every prime p G [Bi{x),Ci{x)] the Tj 
number fx{p) of primes in the interval [B2{p, x), C2{p, x)] is almost indepen- ■ 
dent ofp in the following sense: Zf^Slgiagggg € 1 + o (1) . 

For example, a rectangular notion, where B2ip,x) and C2{p,x) do not depend on p, 
is a banner. Now given any notion A of RSA integers we select a banner B of (almost) 
minimal area enclosing Ax- Note that there may be many choices for B. We can easily 
generate elements in ;Sa; n : Select first an appropriate y € [Bi (x) , Ci (x)] H N, second an 
appropriate z € [B2{p, x),C2{p, x)] n N. By the banner property this chooses (y, z) almost 
uniformly. We obtain the following straightforward Las Vegas algorithm: 

Algorithm 6.2. Generating an RSA integer (Las Vegas version). 
Input: A notion A, a bound x G M>i. 
Output: An integer n = pq with {p, q) e Ax- 

1. Repeat 2^ 

2. Repeat 

3. Select (y, z) at random from Bx H as just described. 

4. Until (y,z) e A. 

5. Until y prime and z prime. 

6. p ^ y,q ^ z- 
1. Return pg. 

The expected repetition count of the inner loop is ^^^^x) which is roughly The 
expected number of primality tests is about By Theorem 5.3 this is for many notions 

m O (In^ x) - We have seen implementations (for example the one of GnuPG) where the 
inner and outer loop have been exchanged. This increases the number of primality tests by 
the repetition count of the inner loop. For this is a factor of about 



(rx) }-2 + r _ (r-1) 




#^FB(r,i) (^) In r + i - 1 r (In r - 1) + 1 ' 

which for r = 2 is equal to 2.5811 and even worse for larger r* Also easily checkable 
additional conditions, like gcd((p— 1)((7 — 1), e) = 1, should be checked before the primality 
tests to improve the efficiency. 



*Side remark: to indicate how a real number was rounded we append a special symbol. Examples: vr = 
3.14J, = 3.142V = 3.1416T = 3.14159A. The height of the platform shows the size of the left-out part and 
the direction of the antenna indicates whether actual value is larger or smaller than displayed. We write, say, 
e = 2.72T = 2.7in as if the shorthand were exact. 



Nolions for RSA inleucrs 25 



6.2. Inverse transform sampling. Actually we would like to avoid generating out-of- 
bound pairs completely. Then a straightforward attempt to construct such an algorithm looks 
the following way: 

Algorithm 6.3. Generating an RSA integer (non-uniform version). 

Input: A notion A, a bound x G M>i. 
Output: An integer n = pq with (p, q) e Ax- 

1. Repeat 

2. Select y uniformly at random from {y G R | 3z € N : {y, z) e Ax} rif^. 

3. Until y prime. 

4. p-<^y. 

5. Repeat 

6. Select z uniformly at random from {z G M | (p, z) G Ax} n N. 

7. Until z prime. 

8. q-^ z. 

9. Return pg. 

The main problem with Algorithm 6.3 is that the output it produces typically is not uniform 
since the sets {z G M | {p, z) G Ax} fl N do not necessarily have the same cardinality when 
changing p. To retain uniform selection, we need to select the primes p non-uniformly with 
the following distribution: 

Definition 6.4. Let A he a notion of RSA integers with tolerance r. For every x G M>i 
the associated cumulative distribution function of Ax is defined as 




[0,1], 

area(.4a,n([l,j/]xR)) 
area{^a:) 



In fact we should use the function Ga^ : M ^ [0, 1], y #(A.ni([i^]nF)x¥)] ^ ^^^^^ 
compute the density but computing G_a^ (or its inverse) is tremendously expensive. Fortu- 
nately, by virtue of Lemma 3.6 we know that F^^ approximates G^^ quite well for mono- 
tone, [ci, C2] -balanced notions A. So we use the function to capture the distribution 
properties of a given notion of RSA integers. As can be seen by inspection, in practically 
relevant examples this function is sufficiently easy to handle, see Table 6.1. Using this we 
modify Algorithm 6.3 such that each element from Ax is selected almost uniformly at ran- 
dom: 

Algorithm 6.5. Generating an RSA integer. 

Input: A notion A, a bound x G M>i. 
Output: An integer n = pq with {p, q) ^ Ax. 

1. Repeat 
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2. Select y with distribution from {y G M | 3^; : {y,z) eAx}^ N. 

3. Until y prime. 

4. p y. 

5. Repeat 

6. Select z uniformly at random from {z G M | {p, z) G Ax} H N. 

7. Until z prime. 

8. z. 

9. Return pgr. 

As desired, this algorithm generates any pair (p, q) G yl^, n (P x P) with almost the same 
probability. In order to generate y with distribution Fj^^ one can use inverse transform 
sampling, see for example Knuth (1998): 

Theorem 6.6 (Inverse transform sampling). Let F be a continuous cumulative distribution 
function with inverse foru G [0, 1] deSned by 

F-^{u) := mf{x G M | F{x) = u} . 

IfU is uniformly distributed on [0, 1], then F~^{U) follows the distribution F' . 

Proof. We have prob(F-^([/) <x) = prob(C/ < F{x)) = F{x). □ 

The expected number of primality tests now is in O (Inx): If .A is [ci, 1] -balanced then 
[y) = as long a&y < x'^^. The exit probabiUty of the first loop is prob(y prime) where 
y is chosen according to the distribution . Thus 



prob(y pnme) / — dwG 

Ji Iny 



1 



In a; ' ci In x 



and we expect O (In x) n $7 (ci In x) repetitions of the upper loop until y is prime. Of course 
we have to take into account that for each trial u an inverse F^^ (u) has to be computed — at 
least approximately — , yet this cost is usually negligible compared to a primality test, see 
Table 6.1. 



6.3. Other constructions. There are variants around, where the primes are selected dif- 
ferently: Take an integer randomly from a suitable interval and increase the result until the 
first prime is found. This has the advantage that the amount of randomness needed is consid- 
erably lower and by optimizing the resulting algorithm can also be made much faster. The 
price one has to pay is that the produced primes will not be selected uniformly at random: 
Primes p for which p — 2 is also prime will be selected with a much lower probability than 
randomly selected primes of a given length. As shown in Brandt & Damgard (1993) the 
output entropy of such algorithms is still almost maximal and also generators based on these 
kind of prime-generators might be used in practice. 
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Notion A 


Fa 


Plot 




* 


r 2(fy^-^) if < y < y^, 

xy{r — l) In r r ^ — V r ' 

/ if < 2/ < Va, 

xj/(r— IJlnr V r ^ — v ' 

Otherwise. 




A 

A 


^FB(r-,(T) 


< 


C / i+CT \ 

r-2--l ^ 

^ — ^ 4^ if^<y<^, 

\/xi arh\r-\-r — 2r 2 +1 j 
y/xy I <Trlar-\-r—2r 2 +1 J 

otherwise. 




1 


^ALGi(r) 


\ 


if^<j/<v^, 

otherwise. 




>y 



Table 6.1: Non-cumulative density functions with respect to y. 



6.4. Comparison. We have seen that Algorithm 6.2 and 6.5 are practical uniform genera- 
tors for any symmetric or antisymmetric notion. 

Note that Algorithm 6.2 and 6.5 may, however, still produce numbers in a non-uniform 
fashion: In the last step of both algorithms a product is computed that corresponds to either 
one pair or two pairs in Ax- To solve this problem we have two choices: Either we replace A 
by its symmetric version S defined by Sx ■= {{y, z) G M^j^ | (y, z) ^ Ax\/ {z, y) G Ax} , 
or by its, say, top half T given by 7^ := {{y,z) & Sx\z > y} before anything else. 

It is now relatively simple to instantiate the above algorithms using the notions proposed 
in Section 4: Namely for an algorithm following the Las Vegas approach, one simply needs 
to find suitable banner that encloses the desired notion. In order to instantiate Algorithm 6.5 
we need to determine the inverse of the corresponding cumulative distribution function for 
the respective notion (see Table 6.1). Still Algorithm 6.2 and 6.5 are practically uniform 
generators for any symmetric or antisymmetric notion. Considering run-times we observe 
that Algorithm 6.5 is much faster, but we have to use inverse transform sampling to generate 
the first prime. Despite the simplicity of the approaches some common implementations use 
corrupted versions of Algorithm 6.2 or 6.5 as explained below. Essentially, they buy some 
extra simplicity by relaxing the uniformity requirement. 

7. Output entropy 

The entropy of the output distribution is an important quality measure of a generator. For 
primality tests several analyses where performed, see for example Brandt & Damgard (1993) 
or Joye & PailUer (2006). For generators of RSA integers we are not aware of any work in 
this direction. 

Let Ax be any monotone notion. Consider a generator Gg that produces a pair of primes 
{p, q) G Ax with distribution g. Seen as random variables, induces two random variables 
P and Q by its first and the second coordinate, respectively. The entropy of the generator 
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Gg is given by 

(7. 1) H{Gg) = H{PxQ) = H{P) + H{Q\P), 

where H denotes the entropy and the conditional entropy is given by 

H{Q\P) = -Y^ prob(P = p)Y^ prob(Q = q\P = p) log2(prob(Q = q\P = p)). 

If Q is the uniform distribution U we obtain the maximal entropy, which we can approximate 
by Lemma 3.6, namely 

H{Gu) = log2(#^ (x)) « log2(area(A)) - logaOnx) + 1 

with an error of very small order. The algorithms from Section 6, however, return the product 
P ■ Q. The entropy of this random variable is at most H{P x Q) and can be at most one bit 
smaller than this: 

H{P-Q) = - prob(P-Q = n)log2(prob(P-Q = n)) 

>- ^ prob(PxQ = (p,g))log2(2prob(PxQ = (p,g))) 
= H{P X Q) - 1. 

One should note here that in real- world implementations the generation of the primes might 
be biased, for example when one uses the above mentioned quite natural generator PRIME INC, 
analyzed in Brandt & Damgard 1993. PRIMEINC chooses an integer and then outputs the 
first prime equal to or larger than this number Note that Algorithm 6.2 and Algorithm 6.5 
do not depend on any prime generator but sample integers until they are prime. However, 
this is not the case in many standards and implementations discussed in Section 9. 

To estimate the entropy of an RSA generator G = P x Q when employing prime gen- 
erators P and Q with with entropy-loss at most e p and £q then the resulting generator will 
by (7.1) have an entropy-loss of at most ep + eg when compared to the same generator 
employing generators that produce primes uniformly at random. 

Interestingly, some of the standards and implementations in Section 9 (like the standard 
IEEE 1363-2000 or the implementation of GNU Crypto) still do not generate every pos- 
sible outcome with the same probability, even if uniform prime generators are employed: 
Namely, if one selects the prime p uniformly at random and afterwards the prime q uni- 
formly at random from an appropriate interval then this might be a non-uniform selection 
process if for some choices of p there are less choices for q. 

If in general the probabiUty distribution g is close to the uniform distribution, say q{p, q) G 
[2~^, 2^] „ L s for some fixed e G M>o, then the entropy of the resulting generator Gg can 
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be estimated as follows: 



= - XI ^'(f''9)log2(e(P,9)) 



{p,q)eAx 



e X 9)[log2(#-^ (^)) - log2(#^ (^)) + 



= [H{Gu)-e,H{Gu) + e\ 



and since the entropy of the uniform distribution is maximal, this impUes that 



H{Gu) - e < H{G,) < H{Gu). 



8. Complexity theoretic considerations 



We are about to reduce factoring products of two comparatively equally sized primes to the 
problem of factoring integers generated from a sufficiently large notion. As far as we know 
there are no similar reductions in the literature. 

We consider finite sets M c N x N, in our situation we actually have only prime pairs. 
The multiplication map //m is defined on M and merely multiplies, that is, fiM- M — 
N, (y, z) y ■ z. The random variable Um outputs uniformly distributed values from M. 
An attacking algorithm F gets a natural number iim{Um) and attempts to find factors inside 
M. Its success probability 



measures its quality in any fixed-size scenario. We call a countably infinite family C of 
finite sets of pairs of natural numbers hard to factor iff for any probabilistic polynomial time 
algorithm F and any exponent s for all but finitely many M G C the success probability 
s\xccf{M) < In^* .X where x = max /iM(Af). In other words: the success probability of 
any probabilistic polynomial time factoring algorithm on a number chosen uniformly from 
M G C is negligible. That is equivalent to saying that the function family {ijlm)m&c is 
one-way. 

Integers generated from a notion A are hard to factor iff for any sequence (xi)^^^ tend- 
ing to infinity the family {Axi n (P x P))igN is hard to factor. This definition is equivalent 
to the requirement that for all probabilistic polynomial time machines F, all s G N, there 
exists a value xq G M>i such that for any x > xq we have su.ccf{Ax) < In""* x. Since M is 
first-countable, both definitions are actually equal. This can be easily shown by considering 
the functions gs,F- K>i R, x t-^ succf{Ax) ■ In* x. The first definition says that each 
function gs^F is sequentially continuous (after swapping the initial universal quantifiers). 
The second definition says that each function gg^p is continuous. In first-countable spaces 
sequentially continuous is equivalent to continuous. 

For any polynomial / we define the set Rf = {{m, n) G N | m < /(n) A n < f{m)} of /- 
related positive integer pairs. Denote by P^"*^ the set of m-bit primes. We can now formulate 
the basic assumption: 



(8.1) 
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Assumption 8.2 (Intractability of factoring). For any unbounded positive polynomial f 
integers from the /-related prime pair family (p("*) x ]P^"^)(m,n)eii/ ^ ^^'^ to factor 

This is exactly the definition given by Goldreich (2001). Note that this assumption impUes 
that factoring in general is hard, and it covers the supposedly hardest factoring instances. 
Now we are ready to state that integers from all relevant notions are hard to factor. 

Theorem 8.3. Let In r G n (i^) for some i and Abe a piecewise monotone, [ci, C2]- 
balanced notion for RSA integers of tolerance r, where c\ is bounded away from zero with 
growing x, and A has large area, namely, for some k and large x we have area Ax > 
Assume that factoring is difficult in the sense of Assumption 8.2. Then integers from the 
notion A are hard to factor 

Actually, under the given conditions Assumption 8.2 can be weakened: we only need that 
integers from the family of linearly related prime pairs are hard to factor. There is a tradeoff 
between the strength of the needed assumption on factoring and the assumption on ci. If 
we relax the restriction on ci in the statement of the theorem to the requirement that cf In x 
tends to infinity with growing x, we need that integers from the family of quadratically 
related prime pairs are hard to factor. 

Proof. Assume that we have an algorithm F that factors integers generated uniformly 
from the notion A. Our goal is to prove that this algorithm also factors certain polynomially 
related prime pairs successfully. In other words: its existence contradicts the assumption 
that factoring in the form of Assumption 8.2 is difficult. 

By assumption, there is an exponent s so that for any xq there is x > xq such that the 
assumed algorithm F has success probability sucCi;'(.Aa;) > In"'* x on inputs from Ax- We 
are going to prove that for each such x there exists a pair (mo, no), the entries both from the 
interval [ci In x — In 2 , C2 In ,t + In 2] , such that F executed with an input from image /ip"io ,p"o 
still has success probability at least In^^''"'"^) x. By the interval restriction, niQ and uq are 
polynomially (even linearly) related, namely mo < and uq < ^rriQ for large x. 

By the assumption on ci the fraction ^ is bounded independent of x. So that contradicts 
Assumption 8.2. 

First, we cover the set Ax with small rectangles. Let Sm,n •= 

p(m) X p(n) ._ 

{(m, ra) G I Sm,n nAx^^} then 

(8.4) AnP^C l+J Sm,n=:Sx. 

{m,n)elx 

Next we give an upper bound on the number if^Sx of prime pairs in the set Sx in terms 
of the number #.A (x) of prime pairs in the original notion: First, since each rectangle Sm,n 
extends by a factor 2 along each axis we overshoot by at most that factor in each direction. 
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that is, we have for c[ = ci - {1 + 2ci)^ and all x G M>i 

C Mfj''''' = ^) G y,z> A |; < < 4a;| . 
Provided x is large enough we can guarantee by Theorem 5.2 that 



(^^ Inx 

On the other hand side we apply Lemma 3.6 for the notion Ax and use that Ax is large 
by assumption. Let c = max(2c2 — 1, 1 — 2ci). Then we obtain for large x with some 

e^(x) G O (c^^x^y 

,, , , . areaM^:) / x x 

c^ln^x 2(^\n^^^x 

Together we obtain 

(8.5) MM> /"^.^^ >ln-(^+^). 



#5, -16c2ln^'+i 



--2 - 

By assumption we have succi;'(^a;) > hi"'* x for infinitely many values x. Thus F on 
an input from Sx still has large success even if we ignore that F might be successful for 
elements on Sx\Ax, 

succ^(5,) > succ^(A)^^ > ln-('=+'*+2)x. 

Finally choose (mo, no) G Ix for which the success of F on Sm„,na is maximal. Then 
succi7'(5m(,.n()) > &uccf{Sx)- Combining with the previous we obtain that for infinitely 
many x there is a pair (mo, no) where the success sucCi?(5mo,no) of F on inputs from 
Smo,no is still larger than inverse polynomial: s\iccF{Smo,no) ^ ln'~('^+*+^) x. 

For these infinitely many pairs (mo, no) the success probabiUty of the algorithm F on 
Smo,no is at least ln~('^+*+^) x contradicting the hypothesis. □ 

All the specific notions that we have found in the literature fulfill the criterion of Theo- 
rem 8.3. Thus if factoring is difficult in the stated sense then each of them is invulnerable 
to factoring attacks. Note that the above reduction still works if the primes p, q are due to 
the side condition gcd((p — l)(g — 1), e) = 1 for a fixed integer e (see Theorem 3.11). We 
suspect that this is also the case when one employes safe primes. 

Mihailescu (2001) shows a theorem which is in some respect more general than our 
considerations and seems to imply our Theorem 8.3. To that end one has to show that one of 
Ax and P("*) X P("), with suitably chosen (m,n) depending on x, is 'polynomially dense' 
in the other. The result would be more general since also the used distribution on Ax, rather 
than being uniform, is allowed to be 'polynomially bounded' . Our proof is of a different 
nature and thus may well be of independent interest. Also it may be adapted to polynomially 
bounded distributions on A. 
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9. Impact on standards and implementations 

In order to get an understanding of the common implementations, it is necessary to consult 
the main standard on RSA integers, namely the standard PKCS#1 (Jonsson & Kaliski 2003). 
However, one cannot find any requirements on the shape of RSA integers there. Interest- 
ingly, they even allow more than two factors for an RSA modulus. Also the standard ISO 
18033-2 (International Organization for Standards 2006) does not give any details besides 
the fact that it requires the RSA integer to be a product of two different primes of simi- 
lar length. A more precise standard is set by the German Bundesnetzagentur (Wohlmacher 
2009). They do not give a specific algorithm, but at least require that the prime factors are 
not too small and not too close to each other. We will now analyze several standards which 
give a concrete algorithm for generating an RSA integer In particular, we consider the 
standard of the RSA foundation (RSA Laboratories 2000), the IEEE standard 1363 (IEEE 
working group 2000), the NIST standard FIPS 186-3 (NIST 2009), the standard ANSI X9.44 
(Accredited Standards Committee X9 2007) and the standard resulting from the European 
NESSIE project (Preneel etal. 2003). 

9.1. RSA-OAEP. The RSA Laboratories (2000) describe the following variant: 

Algorithm 9. 1 . Generating an RSA number for RSA-OAEP and variants. 

Input: A number of bits k, the public exponent e. 
Output: A number n = pq. 

1. Pick p from [[2('^-i)/2j + ^2*^/2-| _ p P such that 
gcd(e,p - 1) = 1. 

2. Pick q from [L2(^-i)/2j _^ ^2^"/^] - l] n P such that 
gcd(e,g- 1) = 1. 

3. Return 

This will produce a number from the interval [2^~^ + 1, 2^^ — 1] and no cutting off. The output 
entropy is maximal. So this corresponds to the notion ^fb(2,o) ggngj-ated by Algorithm 6.5. 
The standard requires an expected number of A; In 2 primaUty tests if the gcd condition is 
checked first. Otherwise the expected number of primaUty tests increases to ■ kin 2, 
see (3.12). We will in the following always mean by the above notation that the second 
condition is checked first and afterwards the number is tested for primality. For the security 
Theorem 8.3 applies. 




9.2. IEEE. IEEE standard 1363-2000, Annex A.16.1 1 (IEEE working group 2000) intro- 
duces our algorithmic proposal: 
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Algorithm 9.2. Generating an RSA number, IEEE 1363-2000. 

Input: A number of bits k, the odd public exponent e. 
Output: A number n = pq. 

2LVJ,2L^J-1 nP such that 
1. 

+ 1 



hiz 



1 . Pick p from 
gcd(e,p - 1 

2. Pick q from 

gcd(e,g - 1 

3. Return pgr. 



p 
1. 



n P such that 




Since the resulting integers are in the interval [2''-^,2^-l] this standard follows ylALG(2,i/2) 
generated by a corrupted variant of Algorithm 6.5 using an expected number of In 2 pri- 
mality tests like the RSA-OAEP standard. The notion it implements is neither symmetric 
nor antisymmetric. The selection of the integers is not done in a uniform way, since the 
number of possible q for the largest possible p is roughly half of the corresponding number 
for the smallest possible p. Since the distribution of the outputs is close to uniform, we can 
use the techniques from Section 7 to estimate the output entropy to find that the entropy-loss 
is less than 0.69 bit. The (numerically approximated) values in Table 9.1 gave an actual 
entropy-loss of approximately 0.03 bit. 



9.3. NIST. We will now analyze the standard PIPS 186-3 (NIST 2009). In Appendix 
B. 3.1 of the standard one finds the following algorithm: 



Algorithm 9.3. Generating an RSA number, FIPS186-3. 

Input: A number of bits k, a number of bits £ < k, the odd public in z 

exponent 2^^ < e < 2^56. 
Output: A number n = pq. 

1. Pick p from [\/2 2'=/2-i^ 2^/"^ - l] n P such that 
gcd(e,p — 1) = 1 and p±l has a prime factor with at least £ 
bits. 

2. Pick q from [x/2 2*^/2- 1, 2^=/^ - l] n P such tiiat 
gcd(e,p — 1) = 1 and g ± 1 has a prime factor with at least £ 
bits and |p - g| > 2^/2-100. 

3. Return 




In the standard it is required that the primes p and q shall be either provably prime or at 
least probable primes. The mentioned large (at least ^-bit) prime factors of p it 1 and q±l 
have to be provable primes. We observe that also in this standard a variant of the notion 
^FB(2,o) generated by Algorithm 6.5 is used. The output entropy is thus maximal. However, 
we do not have any restriction on the parity of k, such that the value A;/2 is not necessarily 
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an integer. Another interesting point is the restriction on the prime factors of p ± l,q ± 1. 
Our notions cannot directly handle such requirements, but this may possibly be achieved by 
appropriately modifying the prime number densities in the proof of Lemma 3.6. 

The standard requires an expected number of slightly more than In 2 primality tests. 
It is thus shghtly less efficient than the RSA-OAEP standard. For the security the remarks 
from the end of Section 8 apply. 

9.4. ANSI. The ANSI X9.44 standard (Accredited Standards Committee X9 2007), for- 
merly part of ANSI X9.31, requires strong primes for an RSA modulus. Unfortunately, 
we could not access ANSI X9.44 directly and are therefore referring to ANSI X9.31-1998. 
Section 4. 1 .2 of the standard requires that 

o p—l,p+l,q — l,q + l each should have prime factors pi,P2,qi, q2 that are randomly 
selected primes in the range 2^'^'^ to 2^^^, 

o p and q shall be the first primes that meet the above, found in an appropriate interval, 
starting from a random point, 

o p and q shall be different in at least one of their first 100 bits. 

The additional restrictions are similar to the ones required by NIST. This procedure will have 
an output entropy that is close to maximal (see Section 7). 

9.5. NESSIE. The European NESSIE project gives in its security report (Preneel et al. 
2003) a very similar algorithm: 

Algorithm 9.4. Generating an RSA number, NESSIE standard. 

Input: A number of bits £, the odd pubhc exponent e. ^ 
Output: A number n = pq. 

1. Pick p from [2^-^, 2^ - l] n P such that 
gcd(e,p - 1) = 1. 

2. Pick q from [2^-\ 2^ - l] n P such that 
gcd(e,^ - 1) = 1. 

3. Return pg. 

The resulting integer n is from the interval [2^^~^,2^^ — 1] and thus corresponds to the 
fixed-bound notion ^^("^'0) generated by Algorithm 6.5. The output entropy is thus max- 
imal. Note the difference to the standard of the RSA foundation: Besides the fact, that in 
the standard of the RSA laboratories some sort of rounding is done, the security parameter 
I is treated differently: While for the RSA foundation the security parameter describes the 
(rough) length of the output, in the NESSIE proposal it denotes the size of the two prime 
factors. For comparison let k = 2i. The standards requires an expected number of 2fc In 2 
primality tests. It is thus as efficient as the RSA-OAEP standard. For the security Theo- 
rem 8.3 applies. 
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9.6. OpenSSL. We now turn to implementations: For OpenSSL (Cox et al. 2009), we 
refer to the file rsa_gen. c. Note that in the configuration the routine used for RSA integer 
generation can be changed, while the algorithm given below is the standard one. OpenSSH 
(de Raadt et al. 2009) uses the same library. Refer to the file rsa . c. We have the following 
algorithm: 



Algorithm 9.5. Generating an RSA number in OpenSSL. 

Input: A number of bits k. 
Output: A number n = pq. 

I fc-i I I fe+i I 

2i—\,2i—\ - 1 nP. 
2L^J,2L^J - il nP. 



1. Pick p from 

2. Pick q from 



3. Return pg. 




This is nothing but a rejection-sampling method with no rejections of a notion similar to 
the fixed-bound notion generated by Algorithm 6.2. The output entropy is thus 

maximal. The result the algorithm produces is always in [2^^"^, 2^ — 1]. It is clear that this 
notion is antisymmetric and the factors are on average a factor 2 apart of each other The 
implementation runs in an expected number of A; In 2 primality tests. The pubhc exponent e 
is afterwards selected such that gcd((p —l){q — 1), e) = 1. It is thus slightly more efficient 
than the RSA-OAEP standard. For the security Theorem 8.3 applies. 



9.7. Openswan. In the open source implementation Openswan of the IPsec protocol 
(Richardson et al. 2009) one finds a rejection-sampling method that is actually implement- 
ing the notion ^™(^''^) generated by a variant of Algorithm 6.2. We refer to the function 
rsasigkey in the file rsasigkey . c: 



Algorithm 9.6. Generating an RSA number in Openswan. 

Input: A number of bits k. 
Output: A number n = pq. 

k-2 I I fe 



Inz 



1. Pick p from 

2. Pick q from 

3. Return pg. 



2L^J,2LtJ-l 
2L^J,2LIJ-1 



nP. 
nP. 



In 2/ 



Note that here the notion is actually symmetric. However still the uniformly at random 
selected number pq will not always have the same length. The implementation runs in an 
expected number of In 2 primality tests and output entropy is maximal. Again the pubhc 
exponent e is afterwards selected such that gcd((p — l)(g — 1), e) = 1. It is thus as efficient 
as the RSA-OAEP standard. For the security Theorem 8.3 applies. 
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9.8. GnuPG. Also GnuPG (Skala et al. 2009) uses rejection-sampling of the fixed-bound 
notion ^^^^(2-1) generated by a variant of Algorithm 6.2, implying that the entropy of its 
output distribution is maximal. 



Algorithm 9.7. Generating an RSA number in GnuPG. 

Input: A number of bits k. 
Output: A number n = pq. 

1. Repeat 2-3 

2. Pick p from 

3. Pick q from 

4. Until len(pg) = 2 \k/2\ 

5. Return pg. 



In z 



2L^J,2L^J-1 

2L^J 2m-i 



np. 
nP. 




The hatched region in the picture above shows the possible outcomes that are discarded. We 
refer here to the file rsa . c. The algorithm is given in the function generate_std and 
produces always numbers with either k or k + 1 bits depending on the parity of k. Note 
that the generation procedure indeed first selects primes before checking the validity of the 
range. This is of course a waste of resources, see Section 6. 

The implementation runs in an expected number of roughly 2.589 • (A; + l)ln2 pri- 
mality tests. It is thus less efficient than the RSA OAEP standards. Like in the other 
so far considered implementations, the public exponent e is afterwards selected such that 
gcd((p — l){q — 1), e) = 1. For the security Theorem 8.3 applies. 

9.9. GNU Crypto. The GNU Crypto hbrary (Free Software Foundation 2009) gener- 
ates RSA integers the following way. Refer here to the function generate in the file 

RSAKeyPairGenerator . java. 

Algorithm 9.8. Generating an RSA number in GNU Crypto. 

Input: A number of bits k. in z 

Output: A number n = pq. 

1. Pick p from 2LVJ,2L^J-1 n. 

2. Repeat 

3. Pick g from 2LVJ,2L^J 

4. Until len(pg) = k and g G P. 

5. Return pg. 



1 




The arrow in the picture points to the results that will occur with higher probability. Also 
here the notion used, but the generated numbers will not be uniformly distributed, 

since for a larger p we have much less choices for q. Since the distribution of the outputs is 



Nolions for RSA inleucrs 37 



Standard 


Notion 


Entropy (entropy loss) 


Remarks 


Implementation 


A; = 768 


A; = 1024 


k = 2048 


PKCS#1 
ISO 18033-2 
ANSI X9.44 
FIPS 186-3 


Undefined 

^FB(2,0) 


< 747.34 


< 1002.51 


< 2024.51 


— 

strong primes 


RSA-OAEP 




747.34 

{yj m) 


1002.51 

{u /ooj 


2024.51 

(.U /ooj 




IEEE 1363-2000 




749.33 

(0.04 %o) 


1004.50 

(0.03 %o) 


2026.50 

(0.01 %o) 


non-uniform 


NESSIE 




749.89 


1005.06 

CO %<i"^ 


2027.06 

\\) /oo ^ 




GNU Crypto 


^FB(2,1) 


747.89 
(0.84 %o) 


1003.06 
(0.62 %o) 


2025.06 
(0.31 %o) 


non-uniform 


GnuPG 


^FB(2,1) 


748.52 

(0 %o) 


1003.69 

(0 Too) 


2025.69 

(0 %o) 




OpenSSL 


~ ^FB(4,0) 


749.89 

(0 %o) 


1005.06 

(0 %o) 


2027.06 

(0 %o) 




Openswan 


^FB(4,0) 


749.89 

(0 %o) 


1005.06 

(0 %o) 


2027.06 

(0 %o) 





Table 9.1: Overview of various standards and implementations. The numbers in parentheses 
give the entropy loss for each algorithm in per mille. As explained in the text, the entropy 
of the standards is sightly smaller than the values given due to the fixed public exponent e. 
FIPS 186-3 has a small entropy loss because of the requirement of strong primes. Generators 
based on nonuniform prime generation suffer extra entropy loss, see page 28. 



not close to uniform, we could only compute the entropy for real-world parameter choices 
numerically (see Table 9.1). For all choices the loss was less than 0.63 bit. The implemen- 
tation is as efficient as the RSA-OAEP standard. 

The Free Software Foundation provides GNU Classpath, which generates RSA inte- 
gers exactly like the GNU Crypto library, i.e. following A™^'^'^\ We refer to the source 
file named RSAKeyPairGenerator . java. As in the other so far considered imple- 
mentations the public exponent e is randomly selected afterwards such that gcd((p — l){q — 
1), e) = 1. Like in the IEEE and the ANSI standard this does not impose practical security 
risks, but it does not meet the requirement of uniform selection of the generated integers. 

9.10. Summary. It is striking to observe that not a single analyzed implementation fol- 
lows one of the standards described above. The only standards all implementations are 
compliant to are the standards PKCS#1 and ISO 18033-2, which themselves do not specify 
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anything related to the integer generation routine. We found that also the requirements from 
the algorithm catalog of the German Bundesnetzagentur (Wohlmacher 2009) are not met in 
a single considered implementation, since it is never checked whether the selected primes 
are too close to each other. The implementation that almost meets the requirements is the 
implementation of OpenSSL. Interestingly there are standards and implementations around 
that generate integers non-uniformly. Prominent examples are the IEEE and the ANSI stan- 
dards and the implementation of the GNU Crypto library. This does not impose practical 
security issues, but it violates the condition of uniform selection. 

10. Conclusion 

We have seen that there are various definitions for RSA integers, which result in substan- 
tially differing standards. We have shown that the concrete specification does not essentially 
affect the (cryptographic) properties of the generated integers: The entropy of the output 
distribution is always almost maximal, generating those integers can be done efficiently, and 
the outputs are hard to factor if factoring in general is hard in a suitable sense. It remains 
open to incorporate strong primes into our model. Also a tight bound for the entropy of 
non-uniform selection is missing if the distribution is not close to uniform. 
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